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Abstract: Random forests, introduced by Leo Brehnan in 2001, are a very 
effective statistical method. The complex mechanism of the method makes 
theoretical analysis difficult. Therefore, a simplified version of random forests, 
called purely random forests, which can be theoretically handled more easily, 
has been considered. In this paper we introduce a variant of this kind of random 
forests, that we call purely uniformly random forests. In the context of regression 
problems with a one-dimensional predictor space, we show that both random 
trees and random forests reach minimax rate of convergence. In addition, we 
prove that compared to random trees, random forests improve accuracy by 
reducing the estimator variance by a factor of three fourths. 
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Bornes de risque pour les forets purement 
uniformement aleatoires. 



Resume : Introduites par Leo Breiman en 2001, les forets aleatoires sont une 

mcthodc statistiquc trcs pcrformantc. D'un point dc vuc thcoriquc, Icur analyse 
est difficile, du fait de la complexite de I'algorithme. Pour expliquer ces per- 
formances, des versions de forets aleatoires simplifiees, et done plus faciles a 
analyser, ont etc introduites. Ccs versions ont etc appclccs forets purement 
aleatoires. Dans cet article, nous introduisons une autre version simplifiee, que 
nous appelons forets purement uniformement aleatoires. Dans un contexte de 
regression, avec une sculc variable explicative, nous montrons que les arbrcs 
aleatoires ainsi que les forets aleatoires atteignent la vitesse de convergence 
minimax. De plus, nous prouvons que les forets aleatoires ameliorent les perfor- 
mances des arbrcs aleatoires, en reduisant la variance des estimateurs associes 
d'un factcur dc trois quarts. 

Mots-cles : Forets aleatoires, Regression non-parametrique, Vitesse 

DE convergence, RANDOMISATION. 
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1 Introduction 

Random forests (RF), introduced by Leo Breiman in 2001 [3], are a very effective 
statistical method. They give outstanding performances in a lot of situations 
for both regression and classification problems. Mathematical understanding of 
these good performances remains quite unknown. As defined by Leo Breiman, 
a random forest is a collection of tree-predictors {h{x,Qi), 1^1^ q}, where 
(0z)is;/^g are i.i.d. random vectors, and a random forest predictor is obtained 
by aggregating this collection of trees. In addition to consistency results, one of 
the main theoretical challenges is to explain why a random forest improves so 
much the performance of a single tree. 

In [3], Leo Breiman introduced a specific instance of random forest, called 
random forests- RI, which has been adopted in many fields as a reference method. 
Indeed, random forests-RI are simple to use, and are efficiently coded in the 
popular R-packagc randomForest [11| . They are effective for a predictive goal 
and they can also be used for variable selection (see e.g. [6], [7]). 

However, forests-RI are very difficult to handle theoretically. This is why 
people are interested in simplified versions, called purely random forests (PRF). 
The main difference is that in PRF, the splits of tree nodes are randomly drawn 
independently of the learning sample; while in random forests-RI, the splits are 
optimized using the learning sample. This independence between splits and 
learning sample makes mathematical analysis easier. In [?], Cutler and Zhao 
introduced PERT (Perfect Random Tree Ensemble) , an algorithm which builds 
some purely random forests, and illustrated its good performance on benchmark 
datasets. More recently Biau et al. [5] showed that both purely random trees 
and purely random forests are universally consistent. 

Our paper offers to examine another simple variant of random forests, which 
can be put in the so-called purely random forests family. We call it purely 
uniformly random forests and we analyze its risk, only in a regression framework 
with a one-dimensional predictor space. The main goal is to emphasize the gain 
of using a forest instead of a tree. The results of this paper are twofold: first we 
show that both purely uniformly random trees and forests risks reach minimax 
rate of convergence on the Lipschitz functions class; second we show that forests 
improve the variance term by a factor of three fourths while not increasing the 
bias. 

The paper is organized as follows. Section [2] presents the model. Section [3] 
and Section |4] give some risk bounds for purely uniformly random trees and 
purely uniformly random forests respectively. Section [S] concludes the paper, 
while proofs are collected in Section [51 

2 Framework 

The framework we consider all along the paper is the classical random design 
regression framework. 

More precisely, consider a learning set £„ = {(Xi, Yi), . . . , (X„, Yn)} made 
of n i.i.d. observations of a vector {X, Y) from an unknown distribution. Y is 
real-valued since we are in a regression framework. We consider the following 
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statistical model: 

Yi = s{Xi) + for i = 1, . . . , n . (1) 

s is the unknown regression function and the goal is to estimate s. We make 
the following assumptions on model ([T|): 

• X Cz [0,1] with continuous density function fi; 

• (ei, . . . ,£„) are i.i.d. observations of e, independent of Cn, with K[e] — 
and where Var(£) — is assumed to be known. 

Note that we deal only with a one-dimensional predictor space. 
This paper aims at comparing performances in estimating s using a single 
random tree and a random forest of a special kind, described in the next section. 



3 Risk bounds for Purely Uniformly Random 
Trees 

3.1 Tree definition 

The principle of Purely Uniformly Random Trees (PURT) is that we draw k 
uniform random variables, which form the partition of the input space [0,1]. 
Then we build a regressogram on this partition, that we call a tree. 

Note that, unlike purely random forests or random forests- RI, the tree struc- 
ture of individual predictors is not obvious. This comes from the fact that in 
PURT the partition is not obtained in a recursive manner. Nevertheless we 
keep the vocabulary of trees and forests to distinguish individual predictors 
from aggregated ones. 

Let us mention that, all along the paper, we make a slight language abuse. 
Indeed, we refer to random tree, the tree himself (as a graph), the corresponding 
partition of [0, 1], as well as the corresponding estimator. 

More precisely, let U = (C/i, . . . , t/fc) be fc i.i.d. random variables of uniform 
distribution on [0,1], where fc is a natural integer which will depend on the 
number of observations n. 

A Purely Uniformly Random Tree (PURT), associated with U, is defined for 
X e [0, 1] as: 

fc 



where 



and (C/(i), . . . , ?7(/j,)) is the ordered statistics of (?7i, . . . , C/fc) and J7(o) — 0, 
f7(fc+i) = 1. jj£ denotes the cardinality of the set £. 

Remark 1 Let us mention that if : [/(j) < Xi = 0, we set 

l3j =0. However as we will see in Section W^ our assumptions on k and n will 
make the probability of observing such an event tend to 0. 
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In addition, let us define, for x G [0, 1]: 

fc 

3=0 

where 

/3, =E[r|[/(,)<x^[/(,+i)]. 

Conditionally on U, sjj is the best approximation of s among all the regres- 
sograms based on U, but of course it depends on the unknown distribution of 

With these notations, we can write a bias-variance decomposition of the 
quadratic risk of su as follows: 

E[(%(X) - s{X)f] - E[{sv{X) - + E[(%(X) - s{X))^] (2) 

= variance term + bias term 

To clarify these variance and bias terms, we emphasize that for a given partition 
u and a given x, we have 

E[s„(x)] = Su{x) 

so K[{su{x) — Su{x))'^] is the variance of the estimator Su{x) and E[(s„(x) — 
s{x))'^] is its bias. We then integrate with respect to (w.r.t) X and U to get 
decomposition 

3.2 Variance of a tree 

We start to deal with the variance term of decomposition ©. First, we work 
conditionally on U, then the problem reduces to the case of a regressogram on a 
deterministic partition, and we can apply the following proposition which comes 
from Arlot [J. 

Proposition 1 Conditionally on U, the variance term of decomposition (0) 
satisfies: 

E[(%(X) - |U] = ^E(l + ^««)(^' + (^')') (3) 

where 

• = mu) 

. {af)' = EMX) - I C/(^.) <X^ ;7(,+i)], 

• Sn,p > 0. 
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We now integrate equation (jS]) w.r.t. U, and we get the following equality: 

k 

E[(%(x) - ~sv{x)f] = ^ E l'^' + ^'iE[j„,pj + m^ff] + n{^ffSn,p,]) (4) 

^ 3=0 

Let us stress that equation ^ is general, since it does not depend on the 
distribution of U. Hence, it can be used for any random partition distributions. 

Finally, using the fact that, in our case, U is made of k i.i.d. random variables 
of uniform distribution on [0,1], we deduce from equation ^ the following 
proposition: 

k 

Proposition 2 If k > +oo, > 0, fi > and s is C-Lipschitz, the 

variance of a PUR Tree satisfies: 

E[{s„{X)-UX)r]^^^^^+ o (^) (5) 

f k\ f(n) 

where the notation o — ) denotes a function f{n) such as — > 0. 

?i^+oc \n J k/n ri^+oc 



Details of the proof of Proposition [2] can be found in Section ISTTI 

k 

The first two hypotheses of Proposition [2] [k > +oo, > 0) are 

the same natural conditions found by Biau et al. 2, for consistency of PRE. 
They guarantee that the number of splits of the tree must grow to infinity but 
slower than the number of samples. 

3.3 Bias of a tree 

We now turn to the bias term of decomposition Direct calculations (see 
Section W% for details) lead to the following upper bound for the bias term of a 
PURT: 

Proposition 3 If ^ is bounded by M > and s is C-Lipschitz, the bias of a 
PURT is upper bounded by: 



E[(Su(X) - s{X))'] ^ ^ 



6MC 



'2 



+ 1 



\2 



(6) 



3.4 Risk bounds for a tree 

Putting together ([5]) and (O leads to the following risk bound for a PURT. 

k 

Theorem 1 If k > +oo, !■ 0, < /i ^ M and s is C-Lipschitz, 

n— 5-+00 ' 77, n— 7- + 00 

the risk of a PURT satisfies: 

21 cr2(fc + l) 6MC2 fk 



E[{sjj{X)-s{X)r]^^—^ + ——^+ o - (7) 
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The balance between the two first terms of tlic right hand side (r.h.s.) of ([7]) 
leads to take (fc + 1) = n^^'^, and gives the following upper bound for the risk 
of a PURT. 

Corollary 1 Under the assumptions of Theorem]^ 

E[(%(X) - s{X)f] + o 

where K is a positive constant. 



Therefore, a PURT reaches the minimax rate of convergence associated with 
the class of Lipschitz functions (see e.g. Ibragimov and Khasminskii [10)'). 

Let us now analyze purely uniformly random forests. As a result, we em- 
phasize an improvement given by a forest compared to a single tree. 

4 Risk bounds for Purely Uniformly Random 
Forests 

4.1 Forest definition 

A random forest is the aggregation of a collection of random trees. So, in the 
context of Purely Uniformly Random Forests (PURE), the principle is to gener- 
ate several PUR Trees by drawing several random partitions given by uniform 
random variables, and to aggregate them. 

Let V = (U^, . . . , U'') be q i.i.d. random vectors of the same distribution as 
U (defined in Section That is for / = 1, . . . , q, U' = (C/{, . . . , ?7^) where 

the (C/j) i-^js^k SLie i.i.d. random variables of uniform distribution on [0, 1]. 

A PURE, associated with V, is defined for x £ [0, 1] as follows: 

s{.x) = - V % {x) ■ 

Let us define, for x e [0, 1]: 

^ 1=1 

Again, we have a bias-variance decomposition of the quadratic risk of s, given 
by: 

E[(5(X) - .(A))2] = mis{X) - ~s{X)f\ + E[(S(A) - s{X)f\ (8) 
— variance term -I- bias term 
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4.2 Variance of a forest 

We first deal with the variance term of decomposition (|8]). We begin to show 
that when letting the number of trees q grow to infinity, the variance of a PURF 
is close to the covariance between two PURT. 

Indeed, since s(x) = — \^s^i{x)^ the variance term satisfies: 



= -E[(%i(X)-Sui(X))2] 

where the last equality comes from the fact that the {{s^i(X) — •Su'(^))i<;<g 
are of the same distribution. 

Now, if we let q grow to infinity, we get: 

E[(s(X)-S(X))2] =E[(%i(X)-%i(X))(J„2(X)-S„2(X))]+ o (1) 

The next step is to upper bound the covariance between two PURT 

E[(%i (X) - %i (X))(su2 {X) - %2 (X))] 

(it is detailed in Section 1^75)) and it leads to the following theorem, which gives 
the behavior of the variance of a PURF: 

k 

Theorem 2 If k > +oo, — > 0, fi > 0, s is C-Lipschitz and 

n— ^+oo fi n— >+oo 

q > +00, the variance of a PURF satisfies the following upper bound: 

E[(J(X)-.W]<|^^(^+ o f^) (9) 

U 

Theorem [5] is to be compared with Proposition [2] and tells us that the variance 
of a PUR Forest is upper bounded by three fourths times the variance of a 
PUR Tree. So, the rate of decay (in terms of power of n) of the PUR Forest 
variance is the same as the PUR Tree variance, and the actual gain appears in 
the multiplicative constant. 

We mention that, as in the analysis of the variance of a tree (see equa- 
tion dH)), we derive, in the proof of Theorem [2l a general statement (see equa- 
tion in Section I6.3p . which does not depend on the distribution of the 
partition defining the random trees. 
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Let us, finally, comment the hypotheses of Theorem [51 First, note that 
the hypotheses on k and n are the same as in Proposition [21 which allows a 
fair comparison between the two results. Second, the hypothesis on q allows 
to ensure that the upper bound on the covariance (given by Corollary [31 in 
Section [575)) leads to the same upper bound for the variance of the forest. Finally, 
the other hypotheses (/i > 0, s is C-Lipschitz) are the same as in Proposition [51 
and help to control negligible terms. 



4.3 Bias of a forest 

We now deal with the bias term of decomposition ([8]) . A convex inequality gives 
that the bias of a forest is not larger than the bias of a single tree: 



E[(S(X) ^ s{X jf] ^-Y^ E[(Su, (X) - s{X))' 



g 

21 



= E[(sV(X)-s(X)) 
So from Proposition [21 we deduce that: 

Proposition 4 If fi is bounded by M > and s is C-Lipschitz, the bias of a 
PURE satisfies the same inequality as that is: 

EmX)^siX)f]^^^ (10) 



4.4 Risk bounds for a forest 

Putting together ^ and (|T0)) leads to the following risk bound for a PURF. 

k 

Theorem S If k +oo, > 0, < ^ M, s is C-Lipschitz and 

n— > + oo ri n— 7- + 00 

q > +00, the risk of a PURF satisfies: 

E (s X) - s(X ^ s; - — i ^ + - —+ o - 

^ ^ ^ " ' ^ 4 n (fc + l)2 n^+oo\nJ 

m 

Again, taking (fc + 1) = n^^^ gives the upper bound for the risk: 
Corollary 2 Under the assumptions of Theorem\^ 

E[{s{X) - s{X))^] ^ Kn-^/^ + o (n-^/^) 

n— )- + oo 

where K is a positive constant. 

m 

So, a PURF reaches the minimax rate of convergence for C-Lipschitz functions. 

Secondly, as the variance of a PUR Forest is systematically reduced com- 
pared to a PUR Tree and the bias of a PUR Forest is not larger than the one 
of a PUR Tree, the risk of a PUR Forest is actually lower. 
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5 Conclusion 

We emphasize, for a very simple version of random forests, the actual gain of 
using a random forest instead of using a single random tree. First, we showed 
that both trees and forests reach the minimax rate of convergence. Then, we 
manage to highlight a reduction of the variance of a forest, compared to the 
variance of a tree. This is, in this specific context, a proof of the well-known 
conjecture for random forests: "a random forest, by aggregating several random 
trees, reduces variance and leaves the bias unchanged" which can be found for 
example in Hastie et al. [9]. 

An interesting open problem would be to generalize this result, which could 
handle more complex versions of random forests and relax the hypotheses we 
made here. Obviously, a more ambitious goal would be to give some precise 
insights explaining the outstanding performances of random forests-RI. 



6 Proofs 

6.1 Proof of Proposition [2] 

We must show that the three last terms in the sum of equation ^ are negligible 
compared to the constant term . 

Let us fix ^ j ^ fc. As it can be found e.g. in Chapter 6 of [5], the probability 
density function of — is the function t G [0, 1] i — > fc(l — t)^~'^ . 

• For the second term E[(5„.pJ: 

from |I] we have 5n,pj ^ '*3("-Pj)^^^^i where K3 is a positive constant. So, 



K4 f 



-1/4 



where m — minu and ka is another positive constant. 

[0,1] 

k 

Since > the last upper bound tends to as n tends to infinity. 

For the third term ma'-f]: 

{a'^f = E[(s(X) - I C/(^.) < X {/(,+!)] 

^ C^(C/(j_(.i) — Ui^j))"^ because s is C — Lipschitz 

So, E[a^^)2] ^ C'n{Ui,+i) - C/o-))'] - (fc^-Y^^q:^ ^liich tends to 
as k tends to infinity. 



• For the last term, the following inequality is sufficient to conclude: 

E[K)''5„,pJ < C2E[^„,pJ, because U(^,+i) - [/(,) s$ 1. 
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6.2 Proof of Proposition [3] 

Function s is supposed to be C-Lipschitz, so 



E[(%(X) - siX) f] = E[(^(,s(X) - /3,)%(„<x^c/o+„)'] 

3=0 
k 

= E[^(s(X)-/3,)'li/<„<x^c/,,+„] 

k 



J=0 



<C2E[5]M([/(,+i) f] 

3=0 

because ^ is bounded by M 

k 

= MC2^E[([/(,.+i) )3] 
6 



(fc + 2)(A; + 3) 
6MC2 



(fc + l)2 ■ 

6.3 Proof of Theorem [2] 

Before entering into details of the proof of Theoreni[21 we recall that in the proof 
of Proposition [1] (which can be found in [T]), calculations lead to the following 
equality: 



where pj = 



E[{MX) - I U] = Y^p^eI—] [a^ + (afr) 



(11) 



Then, an estimation of pjE 
tion[Tl 



npj 



gives the expression — (1 + (5„ p ) in Proposi- 
n 



We note 



Varj =pjE — (cr^ + (aff) 
L npj J 



(12) 



a generic term of the sum in the r.h.s. of (llip . 



We now address the proof of Theorem (5] We begin by introducing some 
notations and establish an intermediate result. The following proposition is not 
only useful to prove Theorem [21 but has its own interest. Indeed, it gives a 
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general upper bound (to be compared to equation ([3])) which does not depend 
on the distribution of random partitions defining the trees. 

In the sequel we denote the covariance between two PURT by: 

C(SU1,%2) =E[(%l(X) -%l(X))(%2(X) -%2(X))] 

Let us consider — (Ul,. . . ,Ul) and = [Uf,. . . ,Ul) two sequences of 
i.i.d. uniform random variables, with respective ordered statistics (J7(\) , . . . , U]^k) ) 
and (t/(2^),...,;7(l)). 

Then we denote by (V(i), . . . ,V{2k)) the ordered statistics of the complete 
vector {Ul Ul Ul C/|), V^(o) = and V^2k+i) = 1- 

(Ef denotes a sum of terms E[(%i (X) - s(X))(%2 (X) - s(X)) | V^t') < 
X ^ V((/+i)] for several consecutive values of t' . 

Finally pt denotes for some j G {0, . . . , fc} either pj or depending on the 
relative positions between the {Ul, . . . , Ul) and the (C/^, . . . , U^) in (V(i), . . . , V(^2k) 
(see details below). 



Proposition 5 The covariance between two PURT satisfies the following upper 
bound: 

t=0 



C(sui,su2) < -I 
n 



(13) 



fc-2 k-1 



where N12 = /c + 1 - E 1(7= <u^ <u^ <u^ <u'^ 



Remark 2 The gain in variance for a PURF comes from the fact that the 
number of terms in the sum of equation il3\} is smaller than fc + 1. Indeed, it 
is k + 1 — Mi^2 where A/1.2 is the number of times that 3 consecutive ordered 
statistics of\]^ are included in 2 consecutive ordered statistics of\]^. 

We now prove inequality of Proposition^ The term (syi {X)~s^i {X)){sjj 
s\]2(X)) equals, by definition, to: 

( E(/?' - Pl)'^ul,<x^ul^ \ ( E(/^' - /3?)lt/<%<.^t/a, J 

\r=0 / \s=0 / 

2k 

= E(/3tV - l^lMs - l3ls)H^,<x^v,,+,, (14) 

where (V(i), . . . , V(^2k)) is the ordered statistics of the vector 
(Ul . . . , Ul, Ul . . . , t/|), t/(o) = 0, V^2k+i) = 1, and 

f = and Pi, = , if ] V(4) , C] C/(V) , C/(V+i)] 

\ 4% = 42 and Pi = if V(4+i)] c]C/2 ), C/^ +1)] 

For / = 1, 2 and j = 0, . . . , fc, we define p = ^-^ (2+l)__ 
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Now, let us give some details for the first term of (HH), denoted by Si{X). 
Without loss of generality, we suppose that V(i) — ?7(\) (i.e. ?7(\) < U^i))- So, 



\np{ 



1 



np\npl 



E 



/3i)lo<x^;7i 



If we denote by E'^^'^[.] the conditional expectation 
E[- 1 (lo<x,issc/,i (lo<x,2^;7,= )i^»2^n]. we have: 



E[^i(X) 
= E pIE 

but 



(1) 

1 



np|npf ^-"^ 



because Y^i and Yj^2 are independent. Hence: 



E[Si{X) \V\V^' 
1 



E 



= E 



pIe 



np\np\ 



np\np\ 



J2 ^''Hiy^-f3lm-|3!)] 



i: 0<X^!iU}- 



^ E[(y, - p\){Y, - Pi) I < X, «C (7(\)] 



1 irT2 



where E^ [.] denotes the conditional expectation E[. | (lo<Xi<(7i 
Now, as 

E[(y, - pl){Y, - (3l) 1 < X, s; ;7(\)] = e[{y - pl){Y - (il) | o < x < c/(\)] 

for all «, and 

E[(r - 131){Y ^Pl)\Q<X^ = + K^^i'^)^ 

where 

{4'^^''f = E[{s{X) - %i {X))(s{X) - Su2(X)) I < X s; V(i)] 

we get 

inpli 
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If we suppose in addition that V(2) = ^fi)' similarly get for the second 
term of ([HI) : 



E[52(X)|U\U2] 
= E[(,V (^) - %i (^))(su2 (X) - s> (X))lc/i^^<x^c/,^^, I U\ 



where 



52 = P(v^(i) < X V(2)) = P(c/(\) <x^ ul)) 

nq2 = ^{i-V^i) <F(2)} 



and 



Since ]V(i), V(2)] is included in C^(2)]: have q2 ^ so: 



E[52(X)|U\U'] ^gaE 



[a +(0-1 j j 



Finally, by summing the two terms 5*1 (X) and S2{X), we deduce that 



i[Si{x) + S2{x)\\]\i]^] «;p?E 



r 1 



d,1.2\2\ 



In conclusion, we succeeded to bound the sum of the first two terms of p4l) by 
an expression very close to Vavj (defined in (1121) '). The only difference comes 
from the fact that instead of (c^)^ we have (ctq 



d,1.2-,2 



(a^ ) ■ But as we saw m 



proof of Proposition [2j these terms are negligible, so p^E 



^^d,i,2^2-j q£ ^j^g same order than Vavj. 

We can easily generalize this fact by proving the following lemma. 

We denote by S'j(X) the j-th term of dill), i.e. S^iX) = {sui(X)-§„i{X)){sv2{X)- 
%2(X))lv(.j<x«;V(,+i)- 



Lemma 1 Let r be in {0, . . . , fc} and denote by t, t' the integers such that 



(15) 



then 



E 



J2Sj{X)\VW 

j=t 

t' 

2\2 



where {^f^'-^f = E(^f ' 

3=t 
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Indeed for all j e {t,t + 1, . . . , t'}, 



r 1 



where 
and 

Thus, 



L npj, J 



9. =P(% <^<^0+i)) 



{af^'f = EMX) - Sv.{X)){s{X) - S^.{X)) \ <X^ 



E 



Y.S,iX)\V\ 



^ P(y(,) <X^ V^t'+i)) E — {a' + (J^'^'^^Y) 



From relation ([TS]) we have P(V(t) < X ^ Vi^t'+i)) — pI, which concludes the 
proof of Lemma [T] 



Therefore, we can upper bound the initial sum (|14[) of 2A:+ 1 terms by a sum 
of A; + 1 terms of the same order as Varj only involving intervals of the partition 
U^. At this stage, we get an upper bound for the variance of a forest which is 
of the same order as the variance of a tree. But we can do better. With similar 
arguments, we can prove the following lemma: 

Lemma 2 // there exist r and s such as 

U{s) < U(r) < ^{r+l) < < ^(s+l) 

the expression 

E[( V {X) - Sjji (X))(% (X) - s> <^^c,i I VW] 



is upper bounded by 



1 



pIE (a^ + (Sf 1'^)^) 



where (S,' O = V^r+s ) + V^r+s+i) ■ 



Indeed, 



E[( V [X) - (X) - s> {X))\u.^^ <x^uU,, I U\ U^] 



and 



p\E 4j1 {o- + iaflff) 
npj 



E[(V(X) - s^.{X)){s^.{X) - ..MV]^,, |U\U2 



np% 
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TP- 11 ■ 111 ^ 2 / d, 1.2x2 ^ I (/.1.2\2 I d,1.2 \2 i / d.1,2 \2 ^ 

Finally, since v^+Vr^i Vs-, \^t\s ) K^r+s ) + V^r+s+\) and (cr^+i,+i) 
("'r+s^)^ + (cr^^^^]^)^, the result is obtained by summing the two terms. 



As in Proposition [T] we replace all p' E 



by their estimates (1 + (5„ 



By repeatedly applying this lemma for all intervals, we can upper bound 

E[(sui(X) - Sui(X))(%2(X) - |U\U2] 

by a sum of A^i_2 terms of the form (1 + (5„_pj )(cr^ + (Ef'^'^)^), where denotes 
for some j 6 {0, . . . , fc} either or p| depending on the fact that we are in the 
situation of Lemma [1] or Lemma [21 = fc + 1 — M\^i and 



fe-2 fe-1 



r— 1 s — 1 



<f^(V)<t'(V+l)<C'(U2,<t^fe + l) 



This concludes the proof of Proposition [5] Now, using the fact that we deal 
with uniform partitions, we manage to prove the following corollary. 



Corollary 3 If k > +oo, 

n— 7- + OC' 

s is C-Lipschitz, we have, 



fc 



n n— S- + 00 



> 0, /i > and 



C(sui,su2) s$ 



n— 5-+00 \ n 



3 (7^(fc+ 1) 

4 n 



o 



Because of the simple draws of random partitions, the number Mi^2 is ex- 
plicitly computable (we know the distribution of the two ordered statistics) and 

it is shown to be equivalent to -(fc + 1) as fc tends to +oo (see Lemma [3] below). 

As in Proposition [2l we have to prove that all terms of the sum are negligible 
compared to the constant one cr^ . To deal with the fact that the number of 
terms in the sum is now random, we use the following simple inequality: 



E 



L 

t=0 



^ (E[a2<5„,pJ +E[(sf ^■2)2] +E[(sf i-^)2^„,p,: 



t=o 



These quantities are of the same kind as the three last terms in the sum of 
equation |4l So with the same techniques we get that 



1. 



-E 



t=o 



o 
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So, we have 



Ep„l(X) -%2(X))] ^ " ''^^ + o 

n ri-!-+oo yn 

Finally, the following technical result allows to conclude the proof of Corollary|3j 
and thus the proof of Thcorcir[51 



Lemma 3 



2(2fc-l) V (fc + l)(fc-3) 
Hence, 

E[Mi,2] = ^+ o (k). 

4 fc— S- + 00 



We then obtain that 



E[Ni2] = ^{k + l)+ o (fc) . 

4 fe— > + oo 



Let us demonstrate lemma [31 

fc-2 fe-1 
r—l s— 1 

As we know the distribution of ordered statistics (see e.g. Section 2.2 of [5]), 
we can compute the following probability: 



= P(C/2) < Ul^^ and C/(V+2) < 



k\ k\ {i + s - iy.{2k - (j + s) - 1)1 



j=r+2 i=0 

k r-1 



So, 



iSi:i:(i:(-^<r''))U:("^-V"^'' 

^ ' r=l s=l \i=0 ^ ^ / yj=r+2 ^ 

(fc!)2 ^'^/r-l + s\ /2fc-r-2-s 



(2fc)! 

^ ' r=l s=l 



r — l /V fc — r — 2 



(by elementary properties of binomial coefficients (see e.g. [5] p. 160)) 
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k-2 



2k-5 



4(2fc- 1) 



i=0 r=t-fe+2 



t+1 \ ( 2A; - 3 - (t + 1) 

k — Z — r 

2k -3 
k-3 



(by defining t = r + s) 



k-2 



2k-5 



A{2k - 1) 



X/ Pw(2fe-3,t+l,fe-3)(i) - FH(2fc-3,t+l,fc-3)(i - + 1)] 



t=0 



(wliere F^(7v.m ,?i) denotes the cumulative distribution function of the hyper- 
geometric distribution) 

k-3 



k-2 



4(2fc-l) 



^2 ^ F^(2fc_3,4+i,fc_3)(t) 



fc-2 
2(2fc - 1 

A;-2 



2(2fc- 1) 



k-4 

E 

(=0 



fc-3 



t + 1 
t+1 



2k-3-{t + l) \\ 
k-3- {t + 1) ' ' 



k + 



2k -3 
k-3 



J 
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